{
 "metadata": {
  "name": "Numba Parakeet Cython"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "Numba vs. Parakeet vs. Cython"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "*This notebook is derived from a blog *\n",
      "[*post*](http://jakevdp.github.io/blog/2012/08/24/numba-vs-cython/)\n",
      "*by Jake Vanderplas on the blog*\n",
      "[*Pythonic Perambulations*](http://jakevdp.github.io) and updated by Olivier Grisel to add Parakeet."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import numpy as np\n",
      "\n",
      "X = np.random.random((1000, 3))\n",
      "X_wide = np.random.random((1000, 100))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 1
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Numpy Function With Broadcasting"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def pairwise_numpy(X):\n",
      "    return np.sqrt(((X[:, None, :] - X) ** 2).sum(-1))\n",
      "%timeit pairwise_numpy(X)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "10 loops, best of 3: 64.7 ms per loop\n"
       ]
      }
     ],
     "prompt_number": 2
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Pure Python Function"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def pairwise_python(X):\n",
      "    M = X.shape[0]\n",
      "    N = X.shape[1]\n",
      "    D = np.empty((M, M), dtype=np.float)\n",
      "    for i in range(M):\n",
      "        for j in range(M):\n",
      "            d = 0.0\n",
      "            for k in range(N):\n",
      "                tmp = X[i, k] - X[j, k]\n",
      "                d += tmp * tmp\n",
      "            D[i, j] = np.sqrt(d)\n",
      "    return D"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 3
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%timeit pairwise_python(X)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "1 loops, best of 3: 9.51 s per loop\n"
       ]
      }
     ],
     "prompt_number": 4
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Alternative python / numpy implementation closer to the parakeet example from the `examples` folder of its git repo to be fair."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def pairwise_python2(X):\n",
      "    n_samples = X.shape[0]\n",
      "    result = np.zeros((n_samples, n_samples), dtype=X.dtype)\n",
      "    for i in xrange(X.shape[0]):\n",
      "        for j in xrange(X.shape[0]):\n",
      "            result[i, j] = np.sqrt(np.sum((X[i, :] - X[j, :]) ** 2))\n",
      "    return result"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 5
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%timeit pairwise_python2(X)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "1 loops, best of 3: 18.2 s per loop\n"
       ]
      }
     ],
     "prompt_number": 6
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "#np.allclose(pairwise_python(X), pairwise_python2(X))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 7
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Numba Wrapper"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Note: I did not use master as I get a `TypeError: 'numba.numbawrapper.NumbaCompiledWrapper' object is not callable` when calling it."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import numba\n",
      "\n",
      "numba.__version__"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 8,
       "text": [
        "'0.9.0'"
       ]
      }
     ],
     "prompt_number": 8
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from numba import double\n",
      "from numba.decorators import jit, autojit\n",
      "\n",
      "pairwise_numba = autojit(pairwise_python)\n",
      "\n",
      "%timeit pairwise_numba(X)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "1 loops, best of 3: 6.72 ms per loop\n"
       ]
      }
     ],
     "prompt_number": 9
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%timeit pairwise_numba(X_wide)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "10 loops, best of 3: 97.3 ms per loop\n"
       ]
      }
     ],
     "prompt_number": 10
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "pairwise_numba2 = autojit(pairwise_python2)\n",
      "\n",
      "%timeit pairwise_numba2(X)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "1 loops, best of 3: 13.9 s per loop"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n"
       ]
      }
     ],
     "prompt_number": 11
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Parakeet Wrapper"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Parakeet is installed from the master branch of the git repo on Jul. 3 2013"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from parakeet import jit\n",
      "\n",
      "pairwise_parakeet = jit(pairwise_python)\n",
      "\n",
      "%timeit pairwise_parakeet(X)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "100 loops, best of 3: 12.3 ms per loop\n"
       ]
      }
     ],
     "prompt_number": 12
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%timeit pairwise_parakeet(X_wide)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "10 loops, best of 3: 101 ms per loop\n"
       ]
      }
     ],
     "prompt_number": 13
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "pairwise_parakeet2 = jit(pairwise_python2)\n",
      "%timeit pairwise_parakeet2(X)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "1 loops, best of 3: 13 ms per loop\n"
       ]
      }
     ],
     "prompt_number": 14
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%timeit pairwise_parakeet2(X_wide)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "10 loops, best of 3: 103 ms per loop\n"
       ]
      }
     ],
     "prompt_number": 15
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "np.allclose(pairwise_parakeet(X), pairwise_parakeet2(X))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 16,
       "text": [
        "True"
       ]
      }
     ],
     "prompt_number": 16
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "np.allclose(pairwise_parakeet(X_wide), pairwise_parakeet2(X_wide))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 17,
       "text": [
        "True"
       ]
      }
     ],
     "prompt_number": 17
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Optimized Cython Function"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!cython --version"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Cython version 0.19.1\r\n"
       ]
      }
     ],
     "prompt_number": 18
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%load_ext cythonmagic"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 19
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%%cython\n",
      "\n",
      "import numpy as np\n",
      "cimport cython\n",
      "from libc.math cimport sqrt\n",
      "\n",
      "@cython.boundscheck(False)\n",
      "@cython.wraparound(False)\n",
      "def pairwise_cython(double[:, ::1] X):\n",
      "    cdef int M = X.shape[0]\n",
      "    cdef int N = X.shape[1]\n",
      "    cdef double tmp, d\n",
      "    cdef double[:, ::1] D = np.empty((M, M), dtype=np.float64)\n",
      "    for i in range(M):\n",
      "        for j in range(M):\n",
      "            d = 0.0\n",
      "            for k in range(N):\n",
      "                tmp = X[i, k] - X[j, k]\n",
      "                d += tmp * tmp\n",
      "            D[i, j] = sqrt(d)\n",
      "    return np.asarray(D)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 20
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%timeit pairwise_cython(X)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "100 loops, best of 3: 6.57 ms per loop\n"
       ]
      }
     ],
     "prompt_number": 21
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%timeit pairwise_cython(X_wide)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "10 loops, best of 3: 95.5 ms per loop\n"
       ]
      }
     ],
     "prompt_number": 22
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Fortran/F2Py"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%%file pairwise_fortran.f\n",
      "\n",
      "      subroutine pairwise_fortran(X,D,m,n)\n",
      "          integer :: n,m\n",
      "          double precision, intent(in) :: X(m,n)\n",
      "          double precision, intent(out) :: D(m,m) \n",
      "          integer :: i,j,k\n",
      "          double precision :: r \n",
      "          do i = 1,m \n",
      "              do j = 1,m \n",
      "                  r = 0\n",
      "                  do k = 1,n \n",
      "                      r = r + (X(i,k) - X(j,k)) * (X(i,k) - X(j,k)) \n",
      "                  end do \n",
      "                  D(i,j) = sqrt(r) \n",
      "              end do \n",
      "          end do \n",
      "      end subroutine pairwise_fortran"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Overwriting pairwise_fortran.f\n"
       ]
      }
     ],
     "prompt_number": 23
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Compile the Fortran with f2py.\n",
      "# We'll direct the output into /dev/null so it doesn't fill the screen\n",
      "!f2py -c pairwise_fortran.f -m pairwise_fortran > /dev/null"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 24
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from pairwise_fortran import pairwise_fortran\n",
      "XF = np.asarray(X, order='F')\n",
      "%timeit pairwise_fortran(XF)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "100 loops, best of 3: 10.8 ms per loop\n"
       ]
      }
     ],
     "prompt_number": 25
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "XF_wide = np.asarray(X_wide, order='F')\n",
      "%timeit pairwise_fortran(XF_wide)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "10 loops, best of 3: 111 ms per loop\n"
       ]
      }
     ],
     "prompt_number": 26
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Scipy Pairwise Distances"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from scipy.spatial.distance import cdist\n",
      "%timeit cdist(X, X)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "100 loops, best of 3: 7.37 ms per loop\n"
       ]
      }
     ],
     "prompt_number": 27
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%timeit cdist(X_wide, X_wide)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "10 loops, best of 3: 97.6 ms per loop\n"
       ]
      }
     ],
     "prompt_number": 28
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Scikit-learn Pairwise Distances"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from sklearn.metrics import euclidean_distances\n",
      "%timeit euclidean_distances(X, X)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "100 loops, best of 3: 16.2 ms per loop\n"
       ]
      }
     ],
     "prompt_number": 29
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%timeit euclidean_distances(X_wide, X_wide)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "10 loops, best of 3: 22.4 ms per loop\n"
       ]
      }
     ],
     "prompt_number": 30
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Remarks and analysis"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "- This was run on a macbook air 2012 2Ghz Core i7 with the default system blas implementation (no MKL) for numpy\n",
      "- Some of the timings vary quite a lot from Jake's original post.\n",
      "- Numba seems to be able to go twice faster than Parakeet when `n_features` is small (e.g. 3 in Jake's original setting)\n",
      "- Numba fails to optimize the python version that uses the numpy notation to compute distances on pairs of rows\n",
      "- Maybe calling numba `nopython=True` would catch this but I did not understand how to add this option and make the first example work so I am not sure how to use that option correctly \n",
      "- Parakeet is almost as fast as numba when `n_features` grows to more realistic sizes (e.g. 100)\n",
      "- Parakeet can work as efficiently with the numpy row slice expression without any issue which allow for a more natural and concise syntax.\n",
      "- Blas (as used in the scikit-learn implementation) is still a killer as soon as all the dimensions are not small (note: the scikit-learn implementation can be less numerically stable though)"
     ]
    }
   ],
   "metadata": {}
  }
 ]
}